Skip to content

Add buf as an alternate Python protobuf code generator#23343

Open
mfairley wants to merge 1 commit into
pantsbuild:mainfrom
mfairley:buf
Open

Add buf as an alternate Python protobuf code generator#23343
mfairley wants to merge 1 commit into
pantsbuild:mainfrom
mfairley:buf

Conversation

@mfairley
Copy link
Copy Markdown

@mfairley mfairley commented May 9, 2026

🧪 Try it: the example repo at https://github.com/mfairley/example-buf
exercises this PR end-to-end — a Python ASGI server using ConnectRPC +
protovalidate, plus a TypeScript client driven by the same buf.lock via
npm workspaces. Clone it as a sibling of your local pants checkout and
run ./pants_from_sources test :: to see the new flow in action.

Background

buf is the modern toolchain for Protocol Buffers, replacing
ad-hoc protoc invocations with a coherent ecosystem:

  • buf.yaml / buf.lock declare and pin proto module dependencies (the
    Buf Schema Registry — BSR — analogue to a package index).
  • buf.gen.yaml centrally configures codegen plugins. Plugins can be
    remote: (hosted on the BSR, fetched with version+revision pins) or
    protoc_builtin: / local: (plain protoc-style binaries on PATH).
  • Modern protocol stacks like ConnectRPC and
    protovalidate are first-class buf citizens —
    they're not really usable through the existing protoc backend without
    out-of-graph workarounds.

For Pants users, the existing pants.backend.codegen.protobuf.python backend
already supports buf format / buf lint, but codegen is protoc-driven.
This PR adds the codegen half: a protobuf_generator='buf' target field that
opts a protobuf_source into running through buf generate instead of
protoc. The protoc path is unchanged when the field is left at its
default.

Concrete benefits Pants users get:

  • ConnectRPC + protovalidate support out of the box. Inference picks up
    connectrpc and protovalidate runtime deps automatically based on which
    plugins appear in buf.gen.yaml.
  • buf.lock is the single source of truth for BSR commits.
    pants generate-lockfiles regenerates it via buf dep update. Codegen is
    blocked early with a fix-instruction error if buf.yaml declares deps:
    without a sibling buf.lock, so reproducibility isn't optional.
  • Hermetic plugin pinning, mirroring how Pants enforces version pins for
    binary tools elsewhere. remote: plugins must declare version +
    revision; Pants's built-in DEFAULT_PLUGIN_PINS synthesizes pins for
    popular plugins so users don't have to look them up by hand.
  • Per-target overrides via the new buf_gen_template field, so
    individual proto targets can use a different buf.gen.yaml without forking
    the global config.

example-buf is a full working
reference repo with buf.yaml + buf.lock + buf.gen.yaml, a Python
ConnectRPC server, validation via protovalidate, and a TypeScript client that
shares the same idl/ and buf.lock.

Usage

Opting in is one field on the proto target:

# idl/acme/greeter/v1/BUILD
protobuf_sources(
    protobuf_generator="buf",
)

With a typical layout:

buf.yaml                  # version: v2; modules: [{path: idl}]; deps: [...]
buf.lock                  # generated by `pants generate-lockfiles --resolve=buf`
buf.gen.yaml              # plugins (protocolbuffers/python, connectrpc/python, etc.)
idl/acme/greeter/v1/greeter.proto

What changes for the user, end-to-end:

  • pants export-codegen :: now runs buf generate for buf targets and
    protoc for everything else. The two paths produce the same kinds of
    artifacts (*_pb2.py, *_pb2_grpc.py, etc.), so downstream targets don't
    need to know which generator was used.
  • Module mapping reads buf.gen.yaml to learn which suffixes (_pb2,
    _pb2_grpc, _grpc, _connect) the configured plugins emit, and registers
    the corresponding modules per proto. New: BSR-dep-provided modules
    (buf.validate.validate_pb2, google.protobuf.timestamp_pb2, etc.) are
    also registered, gated on include_imports: true being set on the
    protocolbuffers/python plugin (so we only claim ownership when the file
    is actually generated).
  • Runtime-dep inference keys off buf.gen.yaml plugin presence rather
    than the existing grpc=True field. A connectrpc/python plugin produces
    _connect.py files, so we infer the connectrpc PyPI package; a
    grpc/python plugin → grpcio; a grpclib_python local plugin →
    grpclib. The protoc branch's grpc=True-driven inference is unchanged
    for non-buf targets.
  • pants generate-lockfiles now also resolves buf.yaml deps. Each
    buf.yaml is a resolve named after its parent dir (buf for repo-root).
    pants generate-lockfiles --resolve=buf runs buf dep update in a
    sandbox.

New options:

  • [buf].extra_plugin_pins — for remote: plugins not in
    DEFAULT_PLUGIN_PINS.
  • [python-protobuf].extra_buf_plugin_suffixes — for custom or forked
    plugins that emit _pb2 / etc. modules.
  • [python-protobuf].extra_buf_bsr_modules — for BSR deps not in
    DEFAULT_BSR_DEP_MODULES.

All three follow the same "registry + extras" pattern as
DEFAULT_MODULE_MAPPING / extra_module_mapping.

A worked example of all of the above is in
example-buf — clone it next to
your pants checkout and run ./pants_from_sources to see Pants resolve
buf.lock, run buf-driven codegen, infer connectrpc + protovalidate
runtime deps, and execute pytest tests against the validator interceptor.

Code design

New module: src/python/pants/backend/codegen/protobuf/buf/. Houses
language-agnostic helpers (yaml parsing, plugin-id matching, pin synthesis,
lockfile rules) so future Go / JVM / etc. buf integrations don't have to
re-implement the plumbing.

  • buf/config.py — yaml parsing (parse_buf_yaml_module_paths,
    parse_buf_yaml_deps, parse_plugin_outs,
    python_pb2_include_imports); pin synthesis
    (synthesize_pinned_buf_gen_yaml + DEFAULT_PLUGIN_PINS); module-root
    resolution; per-target template-request resolvers; async digest fetchers
    (fetch_buf_layout, fetch_buf_gen_contents). Organized into clear
    sections (# ---- buf.yaml parsers ----, etc.) for top-down readability.
  • buf/subsystem.py — moved up from lint/buf/ since codegen needs it
    too. Adds extra_plugin_pins option.
  • buf/fields.pyBufGenTemplateField plugin field, registered on
    ProtobufSourceTarget / ProtobufSourcesGeneratorTarget.
  • buf/lockfile.pyKnownBufResolveNamesRequest /
    RequestedBufResolveNames / GenerateBufLockfile plus their rules,
    hooking into the standard pants generate-lockfiles machinery.

Python-specific bits:

  • python/buf_rules.py — the generate_python_from_protobuf_via_buf
    rule. Bundles buf + protoc into the sandbox (protoc so
    protoc_builtin: / local: plugins resolve), synthesizes a fully-pinned
    buf.gen.yaml, runs buf generate, and returns the digest. Pre-checks
    for buf.lock when buf.yaml declares deps:, raising
    MissingBufLockError with a pants generate-lockfiles --resolve=...
    pointer.
  • python/python_protobuf_subsystem.py — adds DEFAULT_PLUGIN_SUFFIXES,
    DEFAULT_BSR_DEP_MODULES, the extra_buf_* options, and the buf branch
    of runtime-dep inference. Subsystem booleans (grpcio_plugin,
    mypy_plugin, etc.) and the grpc=True field are warned-but-ignored on
    the buf path.
  • python/python_protobuf_module_mapper.py — extended to register
    per-proto modules from buf.gen.yaml + per-buf-module BSR-dep modules.

Cache-isolation correctness: buf_rules.py builds the input digest from
transitive_targets.closure of the target's address. A monorepo with many
unrelated protos in the same buf module sends only the per-target closure
into the sandbox. Verified by test_buf_only_sends_transitive_closure_to_sandbox
(the test plants a malformed sibling proto; codegen succeeding for the unrelated
target proves the malformed file was filtered out).

Backward compatibility: the protobuf_generator field defaults to protoc,
so existing protobuf_sources(...) declarations are unaffected. The protoc
codegen rule, mapper rule, and runtime-dep inference all branch on the field
value before doing anything new.

Tests cover (buf-only — no protoc-side test additions):

  • yaml parsing, pin synthesis, BSR-dep registry, include_imports-gating
    (unit tests in buf/config_test.py)
  • lockfile rule planning (buf/lockfile_test.py)
  • module mapping for buf targets, including the BSR-dep registry on/off and
    extra_buf_bsr_modules (python/python_protobuf_module_mapper_test.py)
  • runtime-dep inference for the buf path
    (python/python_protobuf_subsystem_test.py)
  • end-to-end integration: codegen, per-target template override, remote
    plugin via BSR network fetch, missing buf.lock error, sandbox
    closure-isolation including same-BUILD-file scenarios
    (python/buf_rules_integration_test.py)

Related issues

  • Support generation of ConnectRPC Go code #20684Support generation of ConnectRPC Go code (open). Partially
    addressed: this PR ships the Python half (ConnectRPC + protovalidate via
    buf generate). The language-agnostic helpers in buf/config.py /
    buf/lockfile.py / buf/subsystem.py are the foundation a Go backend
    would slot into.
  • Protobuf linting (and more!) using Buf CLI #13189Protobuf linting (and more!) using Buf CLI (closed).
    Completes the codegen half of the original buf integration request. The
    lint/format half shipped years ago.
  • Support more Python Protobuf plugins #20383Support more Python Protobuf plugins (closed). Once a target
    opts into protobuf_generator='buf', plugin selection lives entirely in
    buf.gen.yaml — exactly the flexibility the OP asked for.

LLM assistance: Code primarily written by Claude Code, but I iterated on
the design with Claude. (Per the
contribution guide.)

@sureshjoshi
Copy link
Copy Markdown
Member

Thanks for the PR!

Recently the maintainers have been discussing that new backends should first start out as published plugins in a user's repo, so that the community can mess around and flesh out the APIs and utility, before it gets merged into Pants directly and becomes more relied on.

I was just commenting on another PR yesterday that I need to add docs to this effect.

A couple simple examples of that are here:
https://github.com/sureshjoshi/pants-plugins/tree/main/pants-plugins/experimental/ty
https://github.com/sureshjoshi/pants-plugins/tree/main/pants-plugins/experimental/pyrefly

This would also let us extract bits and pieces at a time that we know are more API stable (e.g. one there are some lint users who are happy, then pull that in, etc).

@mfairley
Copy link
Copy Markdown
Author

Thanks for the quick response @sureshjoshi! In this case, buf is already a backend in Pants (https://www.pantsbuild.org/prerelease/docs/python/integrations/protobuf-and-grpc#buf-format-and-lint-protobuf) but it's only used for linting and formatting.

This PR adds on the buf generate functionality as an alternative to using protoc. So Pants already uses buf but not for the actual code generation yet (though this is more complex than just linting/formatting).

Would it still make sense to start as a plug in in this case?

@sureshjoshi
Copy link
Copy Markdown
Member

Thanks for the quick response @sureshjoshi! In this case, buf is already a backend in Pants (https://www.pantsbuild.org/prerelease/docs/python/integrations/protobuf-and-grpc#buf-format-and-lint-protobuf) but it's only used for linting and formatting.

This PR adds on the buf generate functionality as an alternative to using protoc. So Pants already uses buf but not for the actual code generation yet (though this is more complex than just linting/formatting).

Would it still make sense to start as a plug in in this case?

Ahh, okay okay, I thought buf lint existed, then didn't see it in the docs - but I had just gotten confused. Then yeah, in that case, this is fair game to extend generation as well, as the tool is already here.

I'll just point out that this will take some time to review, and just at a glance, it feels like a lot of code for what it's doing - which is common for Claude driven PRs, FYI.

So, thanks for the PR, and please have some patience on the review.

@mfairley
Copy link
Copy Markdown
Author

Thanks @sureshjoshi! Quite a few of the changes are just mechanical -- moving files related to buf linting/formatting that were under a subdirectory that wouldn't make sense if buf also does code generation. I could split those into a separate PR but I didn't since there wouldn't be a justification for moving those files without adding the new functionality. Let me know if you have a preference. I can also just put them in a different commit within this PR if it makes review easier.

@edelmanjm
Copy link
Copy Markdown

This would be awesome to have. I'll check this out and give it a try on our codebase.

@sureshjoshi
Copy link
Copy Markdown
Member

Thanks @sureshjoshi! Quite a few of the changes are just mechanical -- moving files related to buf linting/formatting that were under a subdirectory that wouldn't make sense if buf also does code generation. I could split those into a separate PR but I didn't since there wouldn't be a justification for moving those files without adding the new functionality. Let me know if you have a preference. I can also just put them in a different commit within this PR if it makes review easier.

Several standalone smaller, simpler PRs will always make for an easier review :)

@mfairley
Copy link
Copy Markdown
Author

Thanks @sureshjoshi! Quite a few of the changes are just mechanical -- moving files related to buf linting/formatting that were under a subdirectory that wouldn't make sense if buf also does code generation. I could split those into a separate PR but I didn't since there wouldn't be a justification for moving those files without adding the new functionality. Let me know if you have a preference. I can also just put them in a different commit within this PR if it makes review easier.

Several standalone smaller, simpler PRs will always make for an easier review :)

Sounds good. I'll make a couple more PRs and then rebase to simplify this one. Here's the first to bump the buf version #23346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants